royal college
Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis
Background: Large language models (LLMs) have demonstrated substantial potential to support clinical practice. Other than Chat GPT4 and its predecessors, few LLMs, especially those of the leading and more powerful reasoning model class, have been subjected to medical specialty examination questions, including in the domain of primary care. This paper aimed to test the capabilities of leading LLMs as of May 2025 (o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro) in primary care education, specifically in answering Member of the Royal College of General Practitioners (MRCGP) style examination questions. Methods: o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro were tasked to answer 100 randomly chosen multiple choice questions from the Royal College of General Practitioners GP SelfTest on 25 May 2025. Questions included textual information, laboratory results, and clinical images. Each model was prompted to answer as a GP in the UK and was provided with full question information. Each question was attempted once by each model. Responses were scored against correct answers provided by GP SelfTest. Results: The total score of o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro was 99.0%, 95.0%, 95.0%, and 95.0%, respectively. The average peer score for the same questions was 73.0%. Discussion: All models performed remarkably well, and all substantially exceeded the average performance of GPs and GP registrars who had answered the same questions. o3 demonstrated the best performance, while the performances of the other leading models were comparable with each other and were not substantially lower than that of o3. These findings strengthen the case for LLMs, particularly reasoning models, to support the delivery of primary care, especially those that have been specifically trained on primary care clinical data.
What would you do if you had a THIRD thumb? Robotic prosthetic allows people to open bottles, pick up objects and even peel a banana with one hand
Human hands have had 10 digits for millions of years. But it seems scientists at the University of Cambridge don't think this is quite sufficient. The experts have created the'Third Thumb' – a controllable prosthetic that attaches to edge of the right hand. It lets wearers pick up objects, open drinks bottles, sift through playing cards, peel a banana and even thread a needle – all with just one hand. In their study, human volunteers quickly got used to the extra digit – which could'advance our motor capabilities beyond current biological limitations'. The Third Thumb is worn on the opposite side of the palm to a person's real thumb and controlled by a pressure sensor placed under each big toe.
AI 'candidate' fails to pass mock radiology boards
Despite the infamous 2016 prophecy of deep learning expert Geoffrey Hinton, artificial intelligence has not yet replaced radiologists. And according to new data, it appears as though that prediction is still a long way off, as an AI "candidate" recently failed its mock radiology boards. The candidate's results were published this week in The BMJ and compared alongside 26 radiologists who had recently passed the rapid radiographic reporting component of the Fellowship of the Royal College of Radiologists (FRCR) examination. Out of ten mock exams, the AI candidate passed two, achieving an overall accuracy of 79.5%, suggesting that the candidate is not quite "ready to graduate." "Radiologists in the UK are required to pass the Fellowship of the Royal College of Radiologists (FRCR) examination before their completion of training, which allows them to practice independently as radiology consultants. For artificial intelligence to replace radiologists, ensuring that it too can pass the same examination would seem prudent," corresponding author Susan Cheng Shelmerdine, a consultant pediatric radiologist at the Great Ormond Street Hospital for Children in London, and colleagues suggested.
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Cancer Delays in the NHS – Astronomical AI
The key goals of Astronomical AI are to create an AI tool that reduces waiting times for the diagnosis and treatment of lung cancer and improve the workflow of the Oncology department. With this in mind, I came across a BBC Breakfast Segment which was followed up with a BBC article titled: 'Cancer care delays: How bad are they in your area?' The segment and article highlight a rise in long waits for cancer therapy in the last 4 years. The number of people waiting for cancer treatment in the UK has doubled since 2018, the number of people waiting for more than 62 days from referral to treatment (62 days being the maximum target wait time from the NHS, from urgent cancer referral to treatment being started) has risen to over 69,000. This is resulting in patients dying from long delays between diagnosis and treatment.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Government > Regional Government > Europe Government > United Kingdom Government (0.77)
AI software that can predict daily A&E admissions is rolled out TODAY
A computer software is being rolled out in the NHS from today that can predict A&E admissions weeks in advance based on things like Covid rates and 111 calls. The AI technology will be used in over 100 hospitals with major A&E departments in England, nearly half of all NHS trusts. It was found to be able to make forecasts with'impressive' accuracy in a trial at nine trusts by looking at factors including local Covid and flu infection rates, traffic and 111 call data to model how many people will show up at A&E each day. The software also takes into consideration public holidays such as New Year's Eve, when emergency departments are more likely to fill up. There are plans to incorporate weather data in the future, with the cold associated with more falls and traffic accidents and hot temperatures linked to a rise in heart problems.
Why aren't patients being told truth about electric shock therapy?
Jacqui Quibbell has suffered from'crippling periods of depression and suicidal thoughts' for all her adult life. In 2003, her doctors suggested Jacqui underwent electro-convulsive therapy (ECT). This involves attaching electrodes to the patient's head and, under general anaesthetic, passing electric shocks through their brain -- which is said to'rewire' it. 'I didn't know much about ECT, I didn't have Google then,' says Jacqui, 57. 'I started suffering memory loss during the treatment and by the time it finished, my short-term memory had disappeared completely and has never come back.
Beating the cancer backlog is AI's largest test yet
They had been staring at a mammogram for several minutes, and couldn't work out what was wrong. They sent the scan on to an arbitrator radiologist who declared that it was clear. That's when Mia, an artificial intelligence (AI) system developed by British start-up Kheiron Medical, stepped in. The AI immediately highlighted in green an area of concern, a pair of grey clouds which appear to the human eye exactly like the rest of the scan. This section of the scan, Mia concluded, was a sign of cancer.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (0.95)
Babylon Health says its AI can appropriately triage 85% of patients
AI healthcare startup Babylon Health believes it can appropriately triage patients in 85 percent of cases. Babylon Health is best known for GP at Hand, a service which is supported by UK health secretary Matt Hancock and integrated into Samsung Health. GP at Hand links patients with health experts 24/7 using video calls and can facilitate any prescriptions to be sent to local pharmacies. The service, however, has been criticised for an AI chatbot which repeatedly gave unsafe advice and for only taking on healthier, often younger individuals while redirecting cash away from local surgeries relied on by older and sicker patients. Correct triaging is essential to ensure patients receive the appropriate care.
- North America > United States > California (0.06)
- Europe > Netherlands > North Holland > Amsterdam (0.06)
- Europe > Italy (0.06)
CT scans, artificial intelligence and COVID-19
That was really interesting, thank you Patrick for joining us. Patrick Brennan: It was a pleasure, thank you. Norman Swan: Professor Patrick Brennan, who is Professor of Diagnostic Imaging at the University of Sydney. I'm Norman Swan, this has been the Health Report on RN. And don't forget the Coronacast, our daily podcast on all things to do with the coronavirus that Tegan Taylor and I present. You can download it by going to Apple Podcasts, the ABC Listen app, or wherever you get your podcasts. I'll see you next week.
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
DeepMind's new AI can spot breast cancer just as well as your doctor
One in eight women will be diagnosed with breast cancer throughout their lives. In an effort to help with quicker detection, researchers have trained a deep-learning algorithm to spot breast cancer in screening scans as accurately or better than a radiologist. While still at an early stage, the research could eventually help reduce incorrect results in the US and help alleviate the shortage of radiologists in the UK. As early detection is key to treatment, women over the age of 50 are tested in the US and UK even if they don't show signs of the disease. False negatives, when cancer is present but not spotted, can prove deadly, while false positives can be distressing.
- Europe > United Kingdom (0.52)
- North America > United States (0.47)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.95)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.81)